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Abstract 


The  purpose  of  this  thesis  is  to  cons 
feedback  queries  in  information  retrieval 
optimal  retrieval  rule  is  derived  using  the 
decision  rule.  Three  probabilistic  models  a 
queries  to  be  used  in  the  models  are  presented 
which  are  required  to  construct  these  queries 
based  on  relevance  information  from  the 
retrieved  documents.  Finally,  the  effect  of  d 
from  the  optimal  query  in  one  of  the  thr 
retrieval  performance  is  analysed. 
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Chapter  1 


Introduction 

The  purpose  of  constructing  feedback  queries  [14]  is  to 
give  the  user  more  relevant  documents.  Documents  and 
queries  in  information  retrieval  systems  are  usually 
represented  by  n- dimensional  vectors  whose  components  are 
the  keywords  (keywords,  index  terms  and  terms  will  be  used 
interchangeably) .  An  example  is  the  SMART  System  [15].  In 
response  to  a  user  query  Q ,  the  system  retrieves  documents 
that  are  "close"  to  the  query  .  A  simple  way  to  measure  the 
"closeness"  or  "retrieval  status  value"  of  a  document  with 
respect  to  a  query  is  the  number  of  keywords  or  terms  in 
common  between  them.  The  terms  may  be  weighted  with  respect 
to  their  importance  in  the  vector.  So  we  can  also  consider 
a  document  or  a  query  as  a  set  of  weighted  terms. 

Since  what  the  user  wants  can  seldom  be  fully  expressed 
by  a  set  of  keywords  (i.e.  his  query)  and  the  notion  of 
closeness  may  not  be  exactly  represented  by  the  measure 
indicated  above,  it  is  usual  that  only  some  of  the  retrieved 
documents  are  found  to  be  of  interest  to  him.  If  the  user 
is  not  satisfied  with  the  retrieved  documents,  he  may 
request  the  system  to  reformulate  the  query  by  indicating 
which  of  the  retrieved  documents  are  relevant  to  him.  The 
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system  then  attempts  to  make  use  of  the  feedback  information 
to  obtain  better  retrieval  performance. 

Rocchio  and  Salton  [14]  suggest  a  practical  method  for 
modifying  queries  to  achieve  better  performance.  The  new 
query  is  : 

Q '  =  Q  +  a  *  ZL  2S_b*  x ' 

x6Q  (R )  x6Q  ( I ) 

where  a  ^  0,  b  >  0  are  parameters  to  be  determined  and 
Q (R)  and  Q(I)  are  respectively  the  set  of  relevant  documents 
and  irrelevant  documents  retrieved  by  the  initial  query,  Q. 

Yu,  Luk  and  Cheung  [18]  analyse  necessary  and 
sufficient  conditions  under  which  the  parameters  a  and  b 
make  Q'  a  better  query  than  Q.  The  exact  values  of  a  and  b 
to  produce  optimal  results  are  not  located.  Their 
experimental  results  indicate  that  choosing  a=l/lQ(R)l  and 
b= 1 / I Q ( I ) I  works  well  in  practice. 

The  focus  of  previous  research  [13,17]  is  mainly  on  the 
optimality  of  queries  of  model  1  presented  in  chapter  3  of 
this  thesis.  Robertson  and  Sparck  Jones  [13]  report  an 
optimal  query  by  the  Bayes'  theorem.  Van  Rijsbergen  [17] 
uses  Bayes  decision  rule  to  derive  similar  results  of  [13] 
and  he  further  extends  the  model  to  a  special  case  of  term 
dependence.  Kraft  and  Bookstein  [9]  use  the  Neyman-Pear son 
[10]  lemma  to  determine  the  optimal  range  of  retrieval 


status  value. 
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This  thesis  analyses  optimal  queries  in  information 
retrieval  systems.  In  chapter  2,  an  optimal  retrieval  rule 
is  derived  using  the  Neyman-Pear son  Decision  Rule  [6].  In 
chapter  3,  this  optimal  retrieval  rule  is  applied  to  three 
rather  common  models  in  information  retrieval.  In  each 

case,  an  optimal  query  is  derived.  In  chapter  4,  the 
necessary  parameters  required  to  construct  the  optimal 
queries  in  the  three  models  are  estimated,  based  on 

relevance  information  supplied  by  the  user.  In  reference 
[13,17]  an  attempt  is  made  to  estimate  the  parameters  in 
model  1  based  on  relevance  information  of  random  documents. 
However,  the  estimations  in  chapter  4  make  use  of  re tr ie ved 
documents.  In  chapter  5,  the  effect  of  deleting  a  term  from 
the  optimal  query  in  model  1  is  analysed.  A  scheme  to  rank 
the  usefulness  of  the  terms  in  retrieval  is  proposed.  It  is 
hoped  that  the  same  approach  can  be  generalized  to  the  case 
of  deleting  more  index  terms  and  to  other  models.  The 

deletion  of  terms  from  feedback  queries  or  queries 

containing  too  many  index  terms  is  necessary  in  order  to 
speed  up  the  process  of  retrieval. 


. 
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Chapter  2 

The  System,  Performance  Measure  and  Optimal  Retrieval  Rule 

2 . 1  The  System 

Documents  and  queries  are  represented  by  n-dimensional 
vectors  whose  components  are  associated  with  the  index  terms 
of  the  document  space.  It  is  assumed  that  the  higher  the 
value  of  the  i ^  component,  the  more  important  is  the  i*"'*1 
term  to  the  document.  The  value  of  a  term  in  a  vector  is 
known  as  its  weight. 

The  system  retrieves  documents  that  are  'close'  to  the 
query.  The  'closeness'  or  'retrieval  status  value'  of  a 
document  vector  x  with  respect  to  the  query  vector  Q  is 
measured  by  the  system  by  means  of  a  real-valued  function  f. 
A  query  Q  retrieves  document  x  if  and  only  if  f(Q,_x)  >  K 
where  K  is  a  threshold  value.  In  other  words,  if  x 
satisfies  f(Q,)0  >  K  then  x  is  assumed  by  the  system  to  be 

close  to  the  query  Q. 

Heine  [8]  has  discussed  several  forms  of  f.  Here  a 
simple  matching  function  wil  be  used  : 

n 

f(Q,x)  =  > _  q-  *  x.  (2.1.1) 

i  =  l 


. 
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=  Q.x 

T 

=  Q  *  x 

where  Q  =  (q^  q2 

x  =  (Xj,  x  2 
T 

and  _x  is  the  transpose  of  x. 

2 . 2  Performance  Measure  -  recall  and  precision 

Whether  a  document  is  relevant  or  not  relevant  to  a 
query  is  entirely  dependent  on  the  user.  An  ideal  system 
retrieves  all  relevant  documents  and  no  irrelevant 
documents.  However,  retrieval  by  means  of  a  closeness 
function  rarely  yields  the  desired  result.  Thus  the 
objective  is  to  retrieve  as  few  irrelevant  documents  as 
possible  while  retrieving  a  certain  number  of  relevant 
documents.  This  may  be  phrased  in  terms  of  the  two  most 
common  retrieval  performance  measures  in  information 
retrieval,  recall  and  precision,  defined  as  follows: 

Recall  =  Probability  that  a  document  is  retrieved  given 

that  the  document  is  relevant. 

Precision  =  Probability  that  a  document  is  relevant  given 

that  the  document  is  retrieved. 

It  is  clear  that  the  above  objective  is  equivalent  to 
obtaining  the  highest  precision  at  any  given  level  of 


(dot  product) 


(vector  multiplication) 


f  ♦ • • •  <3n • 
,  .  »  .  ,  x  ) 
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recall.  We  now  present  a  retrieval  strategy  that  achieves 
this  aim.  This  strategy  will  be  shown  to  be  equivalent  to 
minimizing  type  2  (beta)  error  for  fixed  type  1  (alpha) 
error  in  statistical  decision  theory. 

2 . 3  Optimal  Re tr ieva 1  Rule 
Let 

P(A)  be  the  probability  that  event  A  occurs, 

P(AlB)  be  the  conditional  probability  of  occurrence 
of  the  event  A,  given  the  event  B, 

P(A,B)  be  the  probability  that  events  A  and  3  co-occur, 
DR  be  the  set  of  documents  retrieved  by  Q, 

R  be  the  set  of  documents  relevant  to  the  query  Q, 

I  bethesetof  documents  irrelevant  to  the  query  Q, 

D  be  the  set  of  documents  in  the  document  space, 

lYl  be  the  number  of  elements  in  the  set  Y, 

be  the  event  that  a  document  is  relevant  to  Q, 

C2  be  the  event  that  a  document  is  irrelevant  to  Q, 

dr  be  the  event  that  a  document  is  retrieved  by  Q, 

9  be  the  probability  that  a  randomly  chosen  document 
is  relevant  to  the  query  Q, 
i.e.  9  =  !Ri/!Di  or  P(C1)  , 

6 (x )  be  the  probability  that  document  x  is  relevant  to 
the  query  Q, 
i.e.  P  (C^ I x) . 

Let 

alpha  =  P  ( ~dr  I  ) 
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and  beta  =  P(drlC2)  . 

Then 

recall  =  1  -  alpha 
precision  =  P(C^ldr) 

=  P(drlC1)  *  P(C1)  /  P  (dr ) 

=  P  (dr  IC]_)  *  P  (C1)  / 

(  P  (dr  IC1) *P  (C1)  +  P  (dr IC2) *P  (C2)  ) 

=  (l-alpha)*9  /  (  (1-alpha) *9+beta* ( 1-9 )  ) 

where  9  =  P  (C ^ )  . 

Since  9  is  independent  of  any  retrieval  strategy,  it  is  easy 
to  see  : 

at  fixed  alpha,  beta  is  minimized  if  and  only  if 
at  fixed  recall  level,  precision  is  maximized. 

Therefore,  the  objective  of  maximizing  precision  at  any 
given  level  of  recall  can  be  met  by  the  Neyman-Pear son 
Decision  Rule  [6]  and  the  r etr ieval  rule  that  achieves  the 
objective  is  : 

P  (x|C]_) 

retrieve  x  if  and  only  if  -  >  K  (2.3.1) 

P  (xlc2) 

The  following  shows  that  the  retrieval  rule  is  optimal, 
in  the  sense  that  it  can  rank  documents  in  order  of  their 
probability  of  relevance. 

The  relation  between  the  probability  of  relevance, 
4>  (x)  ,  of  document  x,  and  the  ratio  P  ( xj  )  /P  (x  I  )  is  : 

<b  (x)  =  P  (Cx  lx) 


. 
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=  P  (x!C]_)  *  P  (C1)  /  P  (x) 

=  P(xlC1)  *  P(CX)  / 

(  P  (x!C1) *P  (CL)  +  P  (x IC2) *P  (C9)  ) 

=  P  (xlC1)  *  9  / 

(  P(x!C1)*9  +  P (x IC2) *  (1-9)  ) 

=  P (xlC1)  /  P  (xlC2)  *  9  / 

(  P (xlC1)/P (xlC2) *9+ (1-9)  ) 

We  immediately  have  :  (for  any  documents  x^ ,  x^ ) 

<t>  ( _x^  )  >  6  (2I2 ^  if  an<i  only  if 
P(x1IC1)/P(x1IC2)  >  P(x2IC1)/P(x2IC2) 

Hence,  P  (x  I  )  /P  (x_  I  )  also  ranks  documents  in  order  of 
their  probability  of  relevance.  As  a  consequence, 
retrieving  documents  in  decreasing  order  of  P  (x  !  )  /P  (xj  ) 

yields  the  highest  expected  number  of  relevant  documents  for 
any  given  number  of  documents  retrieved. 

Next  chapter,  we  shall  construct  optimal  feedback 
queries  under  three  different  distributions  of  term  weights 
on  the  documents,  where  an  optimal  feedback  query  Q*, 

Q*  —  ( w  ,  w2  f  ...  ,  wn ) 

is  a  query  satisfying 
P  (xlC1) 

-  <=>  f(Q*,x,)  >  f(Q*,x9)  for  any  documents  x  ,  x 

P(xlC2)  1  1  1  Z 

i.e.  a  query  Q*  that  can  rank  documents  in  order  of  their 

probablity  of  relevance. 
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Chapter  3 

The  Models  and  Optimal  Queries 

This  chapter  presents  three  commonly  used  models  in 
information  retrieval.  The  optimal  queries  in  the  models 
are  derived,  making  use  of  the  optimal  retrieval  rule. 

Let  T.  ,  1  <  i  <  n,  be  a  random  variable  associated 

l  —  — 

1 1*1 

with  the  weights  of  the  i u  term.  The  models  are 
characterized  by  the  distributions  assumed  on 

—  (T]_  t  •  •  •  'Tn)  • 

3 . 1  Model  1_ 

Each  document  is  a  binary  vector,  i.e.  its  i ^ 

component  is  1  or  0,  depending  respectively  on  the  presence 
or  the  absence  of  the  it^1  term  in  the  document,  1  _<  i  £  n. 
Furthermore,  when  restricted  to  the  relevant  documents  of  Q, 
the  T^'s  are  mutually  independent;  when  restricted  to  the 
irrelevant  documents  of  Q,  the  T^'s  are  also  mutually 
independent . 

This  model  has  been  used  in  the  analysis  of  various 
information  retrieval  processes  [13,19,21].  Optimal  queries 

derivable  from  this  model  are  found  to  be  "first  order"  or 
linear  approximations  to  those  obtainable  in  a  binary  model 
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which  incorporates  the  dependence  of  terms  [Appendix  1]. 


Let  Pi  =  P(Ti=llC1) 

-  p'Ti-llc2) 

and  x  =  (x, ,  .  ..,  x  )  be  a  document 

—  1  n 

Then,  by  the  independence  of  the  T^'s  and  binary  nature 


of  the  term 

weights. 

P  (T  =  x IC1 ) 

n 

1  ! 
i  =  l 

x  . 
Pi  i 

,  1-x  . 
d-Pi)  i 

P  (T  =  x IC2) 

n 

I  1 

r  . x  i 

/I  v  1-X  . 

( 1  -  r  .  )  l 

i  =  l 

X 

n 

Pi 

/U-Pi)  v 

1 

rr  (  — - —  )Ai  *  — ± 

i=l  r./(l-r.)  1-r. 


P(T=x|C,)  n  p./(l-p.)  n  1-p. 

=  >  log  - - - — —  =  > log  ( - - - )  * x  .  +  > log  - 

P  (T  =  x I C  _ )  i  =  l  r . / ( 1-r . )  1  i  =  l  1-r. 

—  —  2  ii  l 

Let 

Pi/(1-pi) 
w.  =  log  - - 

1  r . / ( 1- r  . ) 

i  i 

P(T=xlC,)  n  n  1 -p . 

log  — - — - =  > _  w.x.  +  >_ _  ^0<3 - 

P(T=x!C2)  i=l  1  1  i=l  l-ri 


Let  Q*  =  ( w 1 ,  w2 , 

wn> 

n 

Then  f  (0* , x)  =  > 

wixi 

i  =  l 

P(T=x|C1)  n  1 

=  log 

- -  >  log  - 

P(T=x!C2)  i=l  1 

■ 
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It  is  clear  that  for  any  two  documents  and  x ^ 
f(Q*'2Sl)  >  f  (Q*,x2}  if  and  only  if 


log  - 

P  (T  =  ?i1  I C2  ) 


P(T=x2ICl) 


>  log - 

P  (T=x2 I C  2 ) 


P(T=x2,ci) 


Since  log  is  a  monotonic  increasing  function  of  its 
arguments,  the  ranking  of  documents  by  Q*  is  equivalent  to 
that  by  the  optimal  retrieval  rule.  In  other  words,  with 
respect  to  model  1,  Qj^  i^s  an  optimal  query . 

Yu  and  Salton  [20]  suggest  ranking  terms  in  decending 
order  of  [p^/ ( 1-p^ ) ] / [ r ^/ ( 1-r ^ ) ] .  Roberson  and  Sparck  Jones 
[13]  show  that  taking  the  logarithm  of  that  expression  gives 
an  optimal  result,  although  the  derivation  is  different  from 
-that  given  here.  Van  Rijisbergen  [17]  uses  the  Bayes 
decision  rule  to  derive  the  same  result. 

3 . 2  Model  2_ 

In  this  model,  the  frequency  of  occurrence  of  any  term 
in  the  relevant  documents  follows  a  Poisson  distribution 
[3,7];  its  distribution  in  the  irrelevant  documents  is  also 
Poisson  with  a  different  parameter.  Furthermore,  the 
frequencies  of  occurrences  of  the  terms  in  the  set  of 
relevant  documents  (and  in  the  set  of  irrelevant  documents) 
are  independent.  This  can  be  considered  as  a  slight 
modification  of  the  1 inked-two-Poisson  model  [2]. 

More  precisely,  when  restricted  to  R,  each  takes  on 
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a  Poisson  distribution  with  parameters  and  all  the  TVs 
are  mutually  independent.  When  restricted  to  I,  each  is 
Poisson  distributed  with  parameter  and  the  TVs  are  again 
mutually  independent.  It  follows  that  E[T^|C^]  =  u^  and 
ElTi|C2l  =  vr 


By  the  independence  assumption, 
n 

I  i  P  ( T  .  =  x  .  1C.  ) 
P(T=xlC1)  i=l 


P(T  =  xlC9)  n 

1  T  P  (T  .  =x  .  1C  V 
.  .  1  l  2 

i  =  l 

n 

T~  I  (u . Xi  *  e  u i/x . ! ) 

.  ,  l  l 

i  =  l 


I  I  (v.Xi  *  e  Vi/x.!) 
i=l  1  L 


n  n 

=  exp  [  >  (v.-u.)  ]  *  1  T  (u./v.)Xi 

- — r  i  l  ..ii 

i=l  i=l 


Let  w .  =  log  ( u . /v . ) 

i  ^  l  l 

P  (T=x  I  C-,  )  n 

log  - — - -  =  > _  ( v  . 

P(T=xlC2)  i=l 

Q  *  —  (w^,  w  2  r  • • • t  w  ^ )  . 

It  is  clear  that  Q* 

model  2. 


3 . 3  Model  3 


Here  the  joint  distribution  of  the  weights  of  the  terms 
in  the  relevant  documents  is  multivariate  normal  [11];  in 


n 

-  U  .  )  +  >  W  .  X  . 

i  - — r  i  i 


is  an  optimal  query  with  respect  to 


' 
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the  irrelevant  documents,  it  is  also  multivariate  normal. 

Under  this  assumption,  f(Q,x.)  is  normally  distributed 
[11].  This  model  is  consistent  with  the  Swets  model  [16], 
though  the  assumption  here  is  somewhat  stronger. 

Recently,  there  have  been  some  doubts  about  the 
validity  of  assuming  normal  distribution  for  term  weights 
over  documents.  However,  if  the  correlations  of  terms  are 
removed  by  some  procedures  such  as  factor  analysis  [1,4,12], 
then  it  is  very  likely  that  new  terms  are  normally  and 
independently  distributed. 

A  more  precise  characterization  of  the  model  is  as 
follows.  When  restricted  to  the  relevant  documents,  T  has 
an  n-d imensional  normal  distribution  with  mean  vector 
u  =  (u^,  ...,  un)  and  covariance  matrix  3 ^ ;  when  restricted 

to  the  irrelevant  documents,  it  has  an  n-dimensional  normal 
distribution  with  mean  vector  v  =  (v^,  .  ..,  v^)  and 
covariance  matrix  S ^ • 

By  the  hypothesis,  u^  =  EfT^IC-^]  and  S  ^  (  i  ,  j  )  is  the 

covariance  of  T.  and  T.  when  restricted  to  the  relevant 

i  3 

documents.  The  optimal  retrieval  rule  for  the  continuous 
case  is  : 

p(xlc  ) 

retrieve  x  if  and  only  if  - — — —  >  K 

P (xlC2) 

where 

p(x!C-^)  and  p()t!C2)  are  the  probability  density 


' 


. 


'■  *#  '  *  ■  I  \ 
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functions  of  T  when  restricted  to  the  relevant  documents  and 

irrelevant  documents  respectively. 

« 

Let  PI  =  3.1416 

IM  be  the  determinant  of  matrix  A 

A  ^  be  the  inverse  of  matrix  A 
T 

and  21  be  the  transpose  of  vector  x. 

Then 

p(xlC1)  =  (2*PI ) "n/2 |SX I"1  exp  (-1/2 (x-u)TSl'1 (x-u) } 
p  (sc  I C 2 )  =  ( 2 *PI )  n//^  I  S_2  I  1  exP  { -1/2  (x-v)  TS_2  1  (x-v)  } 

It  follows  that  ranking  document  21  by 

(y-.u)TS^  1 (x-u)  -  (x-v)  ts2  1  (x-v)  (3.3.1) 

is  equivalent  to  ranking  21  by  p  (_x  I  C-^ )  /'p  (y!  C2  )  • 

There  are  two  important  subcases  of  (3.3.1). 

( i )  when  =  3 2  =  S_ 

the  ranking  procedure  can  be  reduced  to  a  linear  form  : 

xTS_1  (u-v)  (3.3.2) 

which  is  the  well  known  Fisher  linear  discriminant  [5]  for 
general  2-way  classification  problems. 

(ii)  when  5_^  =  S_2  =  S_  and  all  index  terms  are  independent, 
(3.3.2)  can  be  further  reduced  to 


n 


> 

i  =  l 


w  •  x  . 

l  l 


where 
w . 


l 


s  . 
1 


2 


and  Q* 


<frV/si 


E  [Ti] 

=  V  a  r  [  T  i  ]  . 

( w  ^  ,  w  2  / 


w^)  is  the  optimal  query  under  the 


assumptions . 
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Chapter  4 

Estimation  of  Parameters  in  the  3  Models 

4 . 1  Introduction  to  Parameter  Estimations 

In  a  realistic  retrieval  environment,  the  user 
specifies  the  content  of  what  he  intends  to  retrieve  by  a 
set  of  keywords.  These  keywords  may  be  weighted  by  the  user 
and/or  the  system.  However  the  user  may  not  be  satisfied 
with  the  performance  of  the  initial  query  Q.  the  keywords 
specified.  The  system  will  be  required  to  modify  the  query 
Q  after  the  user  identifies  the  relevant  documents  in  the 
retrieved  set  of  documents. 

In  this  chapter,  it  is  assummed  that 

Q  —  ( q  *  q  2 1  »  «  •  t  q  ^  ,  Or  •  •  •  r  0) 

where  each  q^  is  a  positive  integer  for  i  £  r. 

The  optimal  queries  of  the  three  models  have  been 
discussed  and  derived  in  chapter  3.  The  following 
parameters  can  be  estimated,  and  the  estimation  process  will 
be  presented  in  this  chapter. 

d  .  ,  r .  in  model  1 

pi '  l 

u . ,  v .  in  model  2 

l  l 

u,  v,  S ,  S,  in  model  3 


. 

■ 
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These  parameters  are  needed  to  construct  the  optimal 
queries  in  the  three  models.  Furthermore,  when  the  above 
parameters  have  been  estimated,  the  number  of  relevant 
documents  in  the  collection  can  also  be  estimated. 

The  estimation  process  for  the  parameters  in  R  (the  set 
of  relevant  documents)  is  the  same  as  that  for  the 
parameters  in  I  (the  set  of  irrelevant  documents).  Hence 
random  variables  defined  in  this  chapter  are  restricted  to  R 
unless  explicitly  stated.  Thus  T^  will  be  taken  to  be 

Tilci- 


Let 


ui'  =  5[T.ldr]  and  p.'  =  P(Ti=l|dr). 


^f ter  the  user  identifies  the  relevant  documents  in  the 
retrieved  set  of  documents,  both  u ^ '  and  p^ '  can  easily  be 
estimated.  Clearly,  when  T^  takes  on  binary  values,  u ^ '  = 


The  following  proposition  shows  that  the  parameters  u^ 
and  p^  may  be  estimated  as  uV  and  p^'  in  models  1  and  2  for 
those  terms  absent  in  the  original  query  Q. 


Proposition  4.1.1 

If  the  Tj's'  1  <.  j  £  n,  are  independent, 


then  u. ’  =  u .  for  i  >  r . 

l  l 


Proof 


By  definition. 


u.  =  >  t  *  P(T.=t)  and 
l  — r—  i 
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u i 1 =  > t  *  P  (T . =t I  dr) . 

It  is  sufficient  to  show 

P(T^=t)  =  P(T.=tldr)  or  equivalently 

P(dr)  =  P(drlT^=t)  for  any  t. 

co  r 

P(drlT.=t)  =  >  P(  >  q  .  T  .  =c  I  T  .  =  t  ) 

1  ^k+i  FT  ]  3  1 

CO  £ 

=  >  P (  >  q .T .=c  ) 

c=k"+1  j^I  ^  ^ 

=  P (dr)  . 

4  ♦  2  Estimation  of  Parameters  in  Model  1_ 

The  parameters  to  be  estimated  are  the  p^'s.  The  next 

t  hi 

proposition  shows  that  if  the  c  term  is  independent  of 
"sufficiently  many"  terms  of  the  original  query,  then  p^  can 
be  estimated.  This  is  a  weaker  independence  assumption  than 
the  one  in  Model  1. 

Proposition  4.2.1 

Let  Q  =  (q-jy  •  •  •  '  °Ir'  0'  ...,0)  and  consider  Q  as  a 

set  of  keywords,  Q  =  {1,2,  ...  ,r}. 

Let  S  ,  S  C  Q ,  be  a  set  of  terms  of  size  m. 
c  c  — 

If  T  and  >  q.T.  are  independent 
C  iesc  1  1 

and  >  q.  >  K,  then 

ies  1 

c 
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P 


c 


p  '  *  IR  Pi  DR  |  -  >  E  • 

jSQ-S.  ^ 


(4.2.1) 


where  E.  ,  j  6  Q-3  ,  is  the  expected  number  of  relevant 
J  c 

documents  containing  terms  c  and  j  and  satisfying 

K  >  >  q .T  . 

iSS  1  1 
c 


+  > _ 

s6Q-Sc,  s> j 


q  T 
^s  s 


>  K-q  . 
D 


and  A  ,  is  the  expected  number  of  relevant  documents 


satisfying  >_  q.T.  >  K. 

iSS  1  1 
c 


Proof  : 

Without  loss  of  generality,  let  =  {r-m+1,  ...  ,r} 

and  Q-S  =  {1,  ...  ,r-m}.  The  probability  that  a  document 

is  retrieved  by  Q  can  be  expressed  by  sum  of  probabilities 
as  follows. 

By  definition, 

P  (dr) 


=  P  (  > _  q.T  >  K  ) 

i  =  l 

r  r 

=  P  (T, =1  ,>  q.T .  >  K-q  )  +  P (T  =0 ,  > _ q.T  >  K  ) 

r  i  r  * _ 

i=2  1-2 

r 

=  P(T1  =  1,  K  >_  > _  qiTi  >  K  -  q  x )  + 

i  =  2 

r  r 

P  (T . =1 , >  q.T.  >  K)  +  P(T  =0,> _  q.T.  >  K) 

1  i=2  1  1  1  i=2  1  1 


■ 
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r  r 

=  P (T, =1  ,  K  >  >  q.T.  >  K-q  )  +  P(>  q.T.>K) 

i=2  1  1  L  i=2  1  1 


r 

expand  P(  >  q.T.  >  K  )  repeatedly, 

i  =  2 


r-m  r 

=  > _  P(  T  .  =1  ,  K  >  >  q.T.  >  K-q.  ) 

j=l  1  i=j+l  1  1  3 

r 

+  p(  q,T.  >  K  )  (4.2.2) 

i  =  r-m+1 


r 

Then  by  the  independence  of  T  and  _> _ 

i  =  r-m+1 


q.T. 

l 


f 


P (dr  IT  =1) 
c 


r-m 


=  >  P  (  T  .=1  ,  K  >  > 

7=1  1  “  T=j+i 


q.T.  >  K-q.  I  T  =1  ) 
l  l  c 


+  P  (  > 


i  =  r-m+1 


q.T.  >  K  ) 

^  l  l 


3y  definition, 


p  '  =  P (T  =1  I  dr )  = 
c  c 


P (dr  IT  =1 )  *  P (T  =1 ) 


c 


P  (dr) 


r-m 

>  P (  T  .=1  ,  K  >  > 
:  - 


j  =  l 


i  =  j  +  l 


q.T.  >  K-q  .  ,  T  =1  ) 
l  c 


+  P  (  > 


i  =  r-m+1 


q.T.  >  K  )  *  p 

^11 


P  (dr) 

Multiplying  both  the  numerator  and  the  denominator  by  iRl 
and  solving  for  pc ,  the  desired  result  follows. 


It  is  clear  that  documents  containing  term  j,  j  <  r, 


' 
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and  satisfying 


K  _>  > _  q 

i6S 

c 


•  T  . 

l  l 


+  > _ 

s6Q-Sc , s> j 


q  T 
Ms  s 


>  K-qj 


are  retrieved  by  Q,  as  are  documents  satisfying  > _  q.T.  >  K. 

ies  1  1 


However 

estimate  p  . 

rc 

less  than  > _ 

i6Q 


c 

Ac  has  to  be  greater  than  zero  in  order  to 
In  a  normal  retrieval  situation,  K  is  much 
q. .  So  it  should  be  easy  to  choose  C  Q  such 


that  >  q.  >  K.  Thus  the  E.'s, 

ies  1  3 

estimateS . 


A  and  hence  p  can  be 
c  ^c 


In  the  special  case  where  all  terms  are  independent 
(which  is  the  assumption  of  model  1),  set  Sc  =  Q  for  c  >  r 
and  S c  =  Q  —  { c }  for  1  <.  c  <_  r . 


In  order  to  study  the  relation  between  p^  and  p  '  ,  let 
Ej'r  j  S  Q-Sc,  be  the  expected  number  of  relevant  documents 
containing  term  j  and  satisfying 


K  >  > 


ies 


q.T. 

l 


> 


s6Q~Sc , s>  j 


q  T  >  K-q  .  . 
^s  s  J 


Then  by  equation  (4.2.2) 

A  =  I R  Pi  DR  I  -  >  E  .  '  , 

c  jeo-sc  3 

equation  (4.2.1)  can  be  written  in  the  form  : 


• 
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p'*lRnDR|->  E. 

r  C  — -  i 

DeQ-s  J 

p  =  — - -=r - 

I  R  n  DR  I  -  > _  E  .  ' 

jSQ-Sc  11 

After  rearranging, 

(p  ' -p  ) *  I R  n  DR  I  =  >  E •  -  p  *>  E  . ' 

C  C  ^eQ_Sc  3  C  j6Q_Sc  ^ 

=  EZ  (E-i  "  Pr*En']  (4.2.3) 

j6Q-Sc  J  c  J 

Consider  the  case  when  all  terms  are  independent. 

Ej  =  P^Ej'  for  j  f  c.  Therefore,  when  c  &  Q,  the  right  hand 

side  of  equation  (4.2.3)  is  zero,  which  is  exactly  the 

result  of  Proposition  4.1.1.  On  the  other  hand,  when  c  6  Q, 

Sc  is  chosen  to  be  Q-{c),  so  the  right  hand  side  of  equation 

(4.2.3)  becomes  Er-p_,E^,  which  is  non-negative.  Hence  p^  >_ 

p  '.  In  general,  the  sufficient  condition  for  p  '  >_  p  is 
c  c  c 

E-  -  p  E.'  >0,  which  is  true  when  T  and  E.'  co-occur 
3  c  3  -  c  3 

f  r equen tly . 

4 . 3  Estimation  o f  Parameters  in  Model  2 

The  parameters  to  be  estimated  are  the  u^'s.  By 
Proposition  4.1.1,  u^  =  lk  ' ,  i  >  r.  Thus  it  is  sufficient 
to  estimate  u^  ,  1  <_  i  <  r . 

Let  Y  =  q-^T-^  +  ^2^2  +  ^r^r 

u  =  E  (Y) 

and  u •  =  E (T . I Y=c) 

1C  1 

We  futher  assume  that  q^  =  l,  i<_i<_r . 


' 


. 
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The  estimation  process  consists  of  expressing  u  in 
terms  of  other  estimatable  parameters  by  means  of 
Proposition  4.3.1.  Then  ,  1  £  i  <  r,  are  expressed  in 
terms  of  u. 


Proposition  4.3.1 

If  Y  is  a  Poisson  random  variable,  then  u  can  be 
estimated  by  setting  c  =  K+2,  K+3 ,  etc.  in  the  following 
equation  : 


E  (Y  I  Y>_c)  *El 


where 

E ^  and  E2  are  the  expected  number  of  relevant  documents  with 
similarity  greater  than  or  equal  to  c  and  (c-1) 
respectively. 

Proof 

After  some  manipulation, 


y ! 

P  (Y=y  I  Y >^c )  =  - — ~l~-u  '  if  Y  >.  c 

u  J  e 

> _  _ 

j  =  c  j  1 

and 

j  -u 

00  u  e 

(  > _  _  )  *  I R I 

j=c-l  ji 

E(YlY>c)  =  u  *  [  - - — - -  1 

00  u  J  e 

(  > _  _  )  *  I R  S 

j  =  c  j  ! 


Solving  for  u,  the  desired  result  follows. 


' 


■ 
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Without  loss  of  generality,  u-^  is  estimated  by  the 
following  proposition. 


P ropos i t ion  4.3.2 

Let  Y  =  T,  +  _  +  T 

1  r 

Z  =  +  ...  +  T 

2  r 

u 1 c  =  E  [T± I Y  =  c ]  . 

If  Z  and  are  independent  Poisson  random  variables, 
then  u^  can  be  estimated  by 

u  1  c 

u  *  — —  ,  for  c  =  K+l,  K+2,  etc. 


Proof 

Since  and  Z  are  independent  Poisson  random  variables 

with  parameters  u^  and  u-u^  respectively. 


t  -u  v  z  (u-u. ) 

u,  e  1  (u-u,  )  e  1 

P(T1  =  t,  Z  =z )  =  — —  *  - - — - - - - 


t ! 


z  i 


=  > 

P(T1  =  t,  Y=y ) 
P  (T1  =  t ! Y=y) 


t  - u ,  .  xY~t 

u,  e  1  ( u-u,  )  e 

_L *  — L— _ 

t!  (y-t) ! 

P(T1  =  t,  Y=y ) 

P  (Y=y ) 


(u-u  ) 


z\fter  s impl ica t ion , 


y  u,  .  u,  _ 

P  (T,  =t  I  Y=y )  =  ()(  —  )(  1 - 1  )Y  , 

t  u  u 

which  is  a  Binomial  distribution  with  parameters  (y,  u^/u)  . 

Since  u,  is  the  mean  of  the  above  binomial  distribution, 
lc 


c 


2  4 


u 


lc 


Remark  :  When  c  >  K+l ,  u,  can  be  estimated.  If  the 

—  '  lc 

T's,  when  restricted  to  R,  are  independent,  then  the 
hypotheses  of  Propositions  4.3.1  and  4.3.2  hold. 


4 . 4  Estimation  of  Parameters  in  Model  3_ 

The  parameters  to  be  estimated  are  u  and  S  .  With  the 

same  notations  as  in  section  4.2,  the  estimation  process 

2 

consists  of  first  estimating  u  (mean  of  Y,  E  [Y ] )  and  3 

2  .  . 
(variance  of  Y,  E  [Y-u]  )  and  then  expressing  and  S ^  ( l ,  j ) 

in  terms  of  these  quantities. 

The  following  additional  notations  are  used. 

Let  h(y)  and  N(t)  be  respectively  the  probability  density 
function  and  the  moment  generating  function  of  Y;  let  g (y) 
and  M(t)  be  respectively  the  probability  density  function 
and  the  moment  generating  function  of  the  conditional  random 
variable  YlY>c. 


By  [ 10 ] , 

E  (Y  |Y>c)  =  M'  (0)  , 

E  (Y2  I  Y>c)  =  M  '  '  (0  )  , 

E  (Y3  I  Y>c)  =  M  '  '  '  (0  ) 

2  2 

and  N(t)  =  exp(ut+S  t  /2)  (4.4.1) 


We  now  relate  u  and  S  to  the  estimatable  quantities 


' 
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M'(0),  M ' ' ( 0 )  and  M1 ' ' (0) ,  making  use  of  the  following 
identity, 
d  p  a  ( t) 

h (y )  dy  =  a'( t)h[a(t)]  -  b*  (t) h [b(t) ]  (4.4.2) 

dt  J  b(t) 


Proposition  4.4.1 

If  Y  is  a  normal  random  variable  with  mean  u  and 
2 

variance  S  ,  then 

u  =  [  M"'(0)  -  M  1  ( 0  )  *  (2M1  1  (0 ) -2  cM '  ( 0 ) +c2 ) ]  / 

[  M  *  '  ( 0  )  -  2  (M  '  (0)  )  2  +  2  cM  '  (0)  -  c2] 
and  S2  =  M ' ' ( 0 )  +  u  *  (c-M'(O))  -  cM'(O), 

where  M(t)  is  the  moment  generating  function  of  YlY>c. 


Proof 


r 


g  (  y)  = 


h(y) 

P  (Y  >  c ) 


=  / 


( 2  *PI  )  1//23 


exp  (■ 


- (y-u) 
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P  ( Y  >  c ) 


if  y  <  c 


if  y  >  c 


Thus , 

M  (t)  = 


etyg(y)  dy 


P  (Y>c) 


exp ( ut+3 2 1 2/2 ) 
(2*PI)1/2  3 


do  ,  2.2 

-  (y-u-3  t) 


exp  ( 


23 


)  dy 
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N(t)  r00 

P(Y>c)jc.s2t 


1 

(2*PI)1/2S 


-(Y-u)2 

exp  (— - ~ - ) 

28 


dy 


by  (4.4.1) 


=  N ( t)G (t) /P  (Y>c) 


where  G  ( t)  = 


1  /2 

Jc-S^t  (2*PI)W  S 


-  (y-u ) 

exp  ( — — ~ — )  dy 
2S 


Differentiating  M(t)  with  respect  to  t ,  making  use  of 
identity  (4.4.2)  and  setting  t=0 ,  the  following  three 
equations  are  obtained. 


G'  (0) 

M '  (0)  =  u  +  - - - 

P  (Y_>c) 

?  2  G’(0) 

M 1 ' (0 )  =  u  +  S  +  (c+u) - 

P  (Y  >_c ) 


M'"(0)  =  u3  +  3uS 2  +  ( 28 2  +  uc  +  u2  +  c2)  - 

P  (Y>c) 

G 1  (0  )  2 

- - - ,  u  and  8  can  be  considered  as  unknowns  in  the 

P(Y>c) 

above  equations. 

G'  (0) 

Eliminating  - —  from  the  second  and  third  equations, 

P  (Y>_c) 

M '  '  ( 0 )  =  u2  +  S2  +  (c  +  u)  (M '  (0 )  -  u)  (4.4.3) 

M '  "  ( 0 )  =  u3  +  3u82  +  ( 28 2 +  uc  +  u2 +c2 )  (M1  (0)-u)  (4.4.4) 

Identity  (4.4.3)  gives 

8 2  =  M '  '  (0)  -  u2  -  (c  +  u)  (M1  (0)  -  u) 

=  H"  (0)  +  u[c  -  M '  (0)3  -  cM'(0)  (4.4.5) 

After  substituting  (4.4.5)  into  (4.4.4)  and  solving  for  u, 


the  desired  result  follows. 


■ 
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The  above  proposition  can  be  used  to  estimate  u 
by  setting  c  =  K+l,  K+2  etc. 

Next,  estimate  u^  and  S  ^  (  i  ,  j  )  . 


By  [11],  (T^,  Y),  1  _<  i  £  n,  is  bivariate  normal 

(Ti I Y=c)  is  also  normal  with 

S  . 

Mean  =  u.  +  g.„  — (c-u)  and 
l  yiY  s 

2  2 
variance  =  S.  (l-g.„  ) 
l  ^  lY 

where  S . 2  =  Var  (T  .  ) 
i  l 

g...  =  correlation  coefficient  of  T.  and  Y. 

^  lY  i 

By  definition. 


uic  =  E  (T.lY  =  c) 

S  . 

=  u.  +  g .  —  (c-u)  . 

S 

Let  a  and  b  be  any  two  values  of  c,  c  >  K. 
Substituting  these  values  in  (4.4.6),  we  obtain 


(4.4. 


S  .  u .  -u . , 
i  la  lb 


iY 


(4 . 4 


a-b 


Substituting  (4.4.7)  into  (4.4.6),  and  solve  for  u ^ , 


u .  -u  . , 

u.  =  u.  -  -la  (C-u) 

1  1C  a-b 


(4.4 


(4.4.7)  can  also  be  written  as 


g  .  3  .  =  S 
^  iY  l 


u  .  -u  . , 
l  a  lb 

a-b 


(4.4 


Hence  (4.4.8)  and  (4.4.9)  permit  the  estimation  of 

q  .  ,,3  .  ,  1  <  i  <  n . 

^  iY  l  —  — 


and  S^ 

,  and 


6) 

7) 

8) 

9) 

u  .  and 
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Now , 

E(Ti2lY  =  c)  =  Var  (T\  I  Y=c)  -  [E(T..lY=c)]2 

_  e  2  2.  2 
-  S.  (l-g.Y  )  -  u 


=  S  . 


i  (SigiY) 


1C 


-  u  . 


1C 


=  >  S  .  2  =  E  (T  .  2  I  Y  =  c)  +  u  .  2  +  (S 

1  1  1C 


u  .  -u .  ,  0 

i a  lb  j 2 

a-b 


Hence  ,  1  £  i  _<  n,  can  also  be  estimated  by  setting 
c  >  K  in  the  above  equation. 


The  covariance  of  T.,  T.  and  3, (i,j)  can  be  estimated  as 

l  j  —  1 

follows . 


Let  3 i j  =  S1  (i,  j) 

3y  [11]  (T i+T j !Y=c)  is  normal  with 


g  . 

mean  =  (u.+u.)  +  9<i+j)y 


(c-u)  and 


var lance  =  S  ,  . 


(  i  +  j ) 


[S  (i  +  j)  *  9 ( i+j)Y]  ' 


where  is  the  standard  deviation  of  (T^+T^)  and 


9 (i+j)Y 


is  the  correlation  coefficient  of  (T.+T.)  and  Y. 

i  1 


^plying  the  same  technique  as  in  the  estimation  of 


3 (i+j)'  “  E  [<Ti+Tj)2|Y=C  1  +  u(i+j)C2 


+  [  s  *  -litila _ -liiilb]  2 

a-b 


where  u^  +  jjc  =  EfT^T.  lY-c)  . 


' 
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2 

Hence  S  j .+^  can  be  estimated  by  setting  c  =  K  +  l ,  K+2, 

etc  in  the  above  equation. 

2 

But  S  ,  .  ,  ..  =  Var  (T  .  +T  .  ) 

(i+9)  i  9 

=  Var  (T  i )  +  Var(Tj)  +  2Cov(Ti,  T  ^  ) 

2  2 

=  S  .  +  S  .  +  23  .  . 

i  9  19 

therefore , 

2  2  2 

S i j  =  [S(i  +  j)  “  si  —  S j  ]  /  2  can  be  estimated. 

4 . 5  Estimation  of  the  Number  of  Relevance  Documents 

Let  u^  and  v^  be  the  estimated  expected  weight  of  the 
i^  term  in  the  relevant  documents  and  the  irrelevant 
documents  respectively.  Let  F^  be  the  sum  of  the  weights  of 
the  it?1  term  in  the  set  of  all  documents.  Then  I R I  can  be 
estimated  by  solving  the  following  two  equations  with 
unknowns  iRl  and  III. 

u  •  *  I  R  I  +  v  .  *  I  I  I  =  F. 

ill 

I R I  +  III  =  ID  I . 


. 
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Chapter  5 


Measurement  of  the  Importance  of  Index  Term  with  respect  to 


Retrieval  Performance  in  Model  1 


5 . 1  Performance  Measure 

It  is  common  to  have  a  few  hundred  terms  in  a  feedback 


query.  Retrieval  using  so  many  terms  is  clearly  not 


economical.  Therefore,  it  is  desirable  to  rank  the  terms 
according  to  their  usefulness  with  respect  to  the  user  so 
that  the  less  useful  terms  can  be  discarded.  The  analysis 
of  ranking  terms  turns  out  to  be  rather  involved.  Thus 
analysis  is  restricted  here  to  the  study  of  the  effect  of 
deleting  a  term  in  Model  1. 

The  optimal  query  Q*  in  Model  1  is 


Q*  =  (wx,  w2. 


where  w^  =  log 


Pi/i-Pi 


1  <  i  <  n 


Without  loss  of  generality,  assume 


I  Wi  I  2l  I  w2  I  > 


>  I  w  I  >  0 


n 


and  the  set  of  document  vectors  is 


D  -  { t  —2  f  *  *  *  '  ~  r 

where  each  has  a  distinct  combination 


of  n  terms . 


■ 


31 


The  effect  on  retrieval  performance  of  deleting  term  j 

verses  deleting  term  i  will  be  studied.  It  will  be  shown 

t  h  ^  ^ 

that  the  i  term  is  "more"  important  than  the  j  term  if 

•  u. 

Iw^l  >  I w j I  .  Thus  the  n  term  is  the  "least  important" 
term  in  Q*. 

Let  Q *  — { i }  be  the  query  identical  to  Q*  except  that  its  1 

t  H 

component  is  zero,  i.e.  the  i  term  of  Q*  is  deleted.  Let 

t 

i  6  and  i  &  21  be  the  events  that  the  i  component  of  21  is 
1  and  0  respectively. 


Retrieval  performance  is 

best  measured  in 

terms 

of 

recall  and 

precision  (see 

chapter  2 ) .  ^n 

example 

is 

presented  in 

section  5.2  which 

illustrates  that 

Q*-{ i] 

is 

not  better  than  Q * { j }  at  all  recall  levels  for  any  i  and  j. 
Thus,  there  is  no  "best"  query  in  {  Q—  { i }  I  1  _<  i  _<  n  }  in 
terms  of  recall  and  precision.  However,  it  will  be  shown  in 
section  5.3  that  Q *  — { j }  is  "better"  than  Q *  — { i }  with  respect 
to  document  ranking  if  Iw^l  >  IwJ. 

In  order  to  compare  the  performance  of  Q *  — { i }  and 
Q *  — { j } ,  we  first  define  associates  and  a  measure  of  goodness 
of  a  document .  By  making  use  of  this  measure,  the  documents 
retrieved  by  Q * - { i }  will  be  compared  to  those  of 

Def in i t ion  5.1.1 

Document  x  is  said  to  be  better  than  document  x  if 
— m  -  — p 

f  (Q*  f  21m)  >  f  Up)  • 


' 


■ 
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Let  g  (£ )  =  f  (Q*  ,  x)  .  The  value  of  g  (}<)  is  a  measure  of 
"goodness"  of  document  x.  The  notion  of  goodness  is 
consistent  with  the  probability  of  relevance,  d>  (x)  ,  of  x  in 
chapter  2  and 

g(xm)  >  g(x  )  <  =  >  <b(xm)  >  <b(x  ). 


Definition  5.1.2 

The  associate  of  x  with  respect  to  term  i  is  the  vector 

x  which  is  identical  to  x  exceDt  at  term  i.  {x  ,  x  )  is 
— p  —  m  —  m  —  p 

called  a  pair  of  associates  with  respect  to  term  i  and  there 

n  1 

are  2  distinct  pairs  of  associates  of  term  i. 


Clearly  when  {x  .  x  }  are  associates  with  respect  to 
1  —  m  — p  r 

term  i ,  the  following  properties  hold. 

(a)  I  f  (Q* ,  xm)  “  f(Q*»  2£p)  I  =  Iw.l, 

(b)  f (Q *- {  i  } ,  xm)  =  f (Q*- {  i  }  ,  xp) 

=  min  {  f (Q * ,  x m )  ,  f ( Q *  ,  x^)  } 

i.e.  Q *  — { i }  gives  the  same  rank  to  x  and  x  , 

3  — m  — p 

(c)  if  {xm'f  _x p '  }  is  another  pair  of  associates  with  respect 
to  term  i ,  g(x  )  >  g(x  )  and  g(x  ')  >  g(x  ') 

then 


g(xm)  >  g(xm')  <=>  g(xn)  >  g(2Ln,)* 


-m 


Definition  5.1.3 

If  Q-^  retrieves  {x^  x^  2'  •  •  •  <-  a  set  of  k 

documents, 

Q2  retrieves  {x2  *22'  ***'  -2  k * '  a  Set  of  k 


■ 
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documents 


and  >  9<*i,ra+i 


1  <  m  <  k 


then  Q-^  is  said  to 


be  better  than  with  respect  to 


document  rankinq  (written  as  0.  >  Q~)  if  q(x,  )  >  q(x„  ) 

2  — l , m  —  —  2,m 

for  1  £  m  £  k  and  at  least  one  inequality  is  ’>'  for  some  k. 


5 . 2  Q *- { j  }  is  not  better  than  Q *  — { i }  at  all  recall  levels 
An  optimal  retrieval  rule  retrieves  documents  in 


descending  order  of  relevance  or  "goodness".  The  deletion 


of  term  i  from  Q*  causes  the  associates  with  respect  to  term 
i  to  be  retrieved  together. 

Suppose  document  x  which  does  not  contain  term  i  is 

between  the  two  associates  {x  ,  x  }  with  respect  to  term  i, 

— m  — p 

g  (x_m)  >  g  {x)  >  g  (£  )  .  Then  deleting  term  i  has  no  effect  on 

the  "retrieval  status  value"  of  document  x  but  will  lower 

the  retrieval  status  value  of  x  to  that  of  x  .  If  w.  >  0, 

— m  — p  i 

then  _xm,  which  has  a  higher  probability  of  relevance  than 

x  ,  is  retrieved  after  x.  If  w.  <  0,  then  x  ,  which  has  a 
~P  i  “P 

lower  probability  of  relevance,  is  retrieved  ahead  of  x.  In 
either  case  a  loss  of  precision  results.  Similarly,  it  can 
also  be  seen  that  worse  retrieval  performance  will  result 
when  document  x  contains  term  i.  Thus  the  more  documents 
there  are  between  associates  of  a  term,  the  higher  the 
likelihood  of  losing  precision  when  the  term  is  deleted.  As 
a  consequence,  when  considering  the  choice  of  deleting  one 


__ 

' 


34 


term  versus  another  term,  it  is  desirable  to  choose  the  term 
with  as  few  documents  between  its  associates  as  possible. 

Intuitively,  it  seems  that  Q *— { j }  would  be  better  than 
Q*-{i}  if  I  w  ^  I  >  IwJ.  However  the  following  example 
illustrates  that  the  precision  of  Q*-{j}  can  be  lower  than 
that  of  Q*-{i}  at  some  recall  points,  though  Q *  — { j }  is 
actually  better  with  respect  to  document  ranking,  as  is 
shown  in  section  5.3. 

Let  Q*  =  (w^,  w2) 

Let  I R (x)  I  and  1 1  (_x)  I  be  the  expected  number  of  relevant  and 
irrelevant  documents  with  representation  x  respectively. 


If  X  *  (1, 

0 )  then 

IR  (x)  I 

=  IR  I 

*Pi 

*(1~P2)  and 

1  I  (x)  I 

=  III 

*ri 

*(l-r2). 

The  performance  of 

Q*~ { 1 } 

and 

Q*- 

{2}  will  be  compared. 

Assume  iRl 

=  100, 

1 I !  =  1000 

and 

T1 

T 

2 

Pi 

0.90 

0.20 

r  . 

l 

0 .75 

0.40 

w  . 

l 

0 .48 

• 

o 

1 

f  (Q*,x) 

1  I  (x)  1 

IR  (x) 

1 

cum.  1 1  ( _x)  I  /  cum.|R(>c) 

x-L=  (1  f  0  ) 

0 .48 

450 

72 

450/72  =  6.25 

x2=(l,l) 

0 . 04 

300 

18 

750/90  =  8.25 

x3=  (0,0) 

0.00 

150 

03 

900/98  =  9.18 

x4=  (0,1) 

i 

o 

• 

100 

0  2 

1000/100  =  10. 

' 
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Associates 

of 

term  2  are 

{ *i , 

x2l  and 

{ x3  , 

—4 

Associates 

of 

term  1  are 

{ 2LX  , 

x^ }  and 

{x2. 

—4 

Q  *-  {  2  }  ranks  {x^,  x  2  ^  highest. 

Q*- { 1 }  ranks  {x^,  X3 J  highest, 
and  g  (x2 )  >  g  (x  )  . 

The  number  of  relevant  documents  with  representation  x^  and 
is  90. 

The  number  of  irrelevant  documents  with  representation  x^ 
and  x2  is  750. 

The  number  of  relevant  documents  with  representation  and 
x^  is  80. 

The  number  of  irrelevant  documents  with  representation 
and  >13  is  600  . 

If  only  80  relevant  documents  are  recalled,  Q *  — { 1 } 
retrieves  600  irrelevant  ones,  but  Q  *— { 2 }  retrieves 
750*80/90=666  irrelevant  ones. 

Q *- { 2  }  is  not  better  than  Q *  — { 1 }  at  this  recall  level. 
The  reason  is  that  a  random  sample  has  to  be  drawn  from  the 
set  { x  ,  —2^'  s^nce  documents  in  the  set  has  the  same 

retrieval  status  value  with  respect  to  Q*-{2}. 


- 
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5 . 3  Which  i s  better  ,  Q *  —  {  j  }  or  Q *  —  {  i  }  1_ 

In  this  section  Q *  — { j }  is  shown  to  be  better  than 

Q*-{i}  with  respect  to  document  ranking  (Definition  5.1.3) 

if  Iw.l  >  Iw.l.  As  a  result,  Q*-{n}  ranks  documents  closer 
i  j 

to  the  optimal  ranking  by  Q*  than  does  Q*-{i},  1  <_  i  <  n. 

First,  a  series  of  lemmas  about  the  properties  of  the 
best  document  with  respect  to  Q*  are  proved.  Second, 

Proposition  5.3.4  relates  the  documents  retrieved  by  Q  *  — {  i } 
and  Q*.  Third,  Proposition  5.3.6  illustrates  the 

relationship  between  the  associates  of  term  i  and  the 

associates  of  term  j,  making  use  of  Lemma  5.3.5.  Lastly, 

the  main  result.  Proposition  5.3.7,  shows  that  Q *  — { j }  is 
better  with  respect  to  document  rank  ing . 

We  further  assume  no  two  distinct  document  vectors  have 
the  same  retrieval  status  value  with  respect  to  Q*.  Without 
loss  of  generality,  assume 

g  ( x>1 )  >  g  (x2)  >  .  .  .  >  g  (x  )  • 

The  above  assumption  guarantees,  for  any  i,  1  £  i  £  n ,  that 
f(Q*-{i},  x  )  =  f(Q*-{i},  x  )  =>  x  ,  x  are  associates  with 
respect  to  term  i. 

The  following  lemma  shows  that  the  best  document  x ^  is 
retrieved  ahead  of  any  other  document  by  Q  *  —  { i }  ,  1  £  i  £  n. 


■ 
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Lemma  5.3.1 

For  any  i,  1  £  i  £  n, 

f(Q*-{i),  X]^  £  f(Q*“{i)r  xfc)  for  1  £  t  £  2n. 

Proof 

Consider  the  case  when  w.  >  0  (the  case  where  w.  <  0  can  be 

l  l 

handled  in  a  similar  way) . 

Let  Xy  be  an  associate  of  x^  with  respect  to  term  i.  Then  i 

6  x^ ,  and  i  0  x y.  Suppose  there  is  a  document  x ^  lying 

between  x.  and  x  , 

—1  —v 

i.e.  g  ( x-1 )  >  g  ( x  )  >  g  (x  )  . 

x  must  contain  term  i.  otherwise  its  associate  with 
— m 

respect  to  term  i  would  be  better  than  x ^ . 

Now  f  (Q *  ,  xT)  =  f  (Q *-  {  i  }  ,  X]_)  + 

f (Q* /  xm)  =  f (Q  *- {  i  }  ,  xm)  + 

g(x1)  >  g(xm)  =>  f  (Q*~ { i ) t  x1)  >  f  (Q*— { i } ,  xm) . 

Hence  any  document  better  than  x.v  satisfies  the 
inequality.  Next  consider  any  document  x  that  is  worse 
than  x  . 

f  (Q*~  {  i }  r  x-|_)  =  f  (Q*~  (  i  }  f  xv) 

=  f(Q*»  *v) 

>  f(Q*,  xfc) 

>.  f  (Q*-{  i}  ,  xfc)  • 

Lemma  5.3.2 

The  associate  of  with  respect  to  term  n  must  be 
if  | wi I  >  I wn I f  1  £  i  <  n. 
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Proof 

For  simplicity,  assume  >  0. 

In  order  to  be  the  document  of  highest  relevance,  x 
must  contain  all  the  positive  terms  and  none  of  the  negative 
terms.  Any  x  different  from  x.^  must  contain  at  least  one 
negative  term  or  be  missing  at  least  one  positive  term.  In 
either  case,  f(Q*,  ^  ~  f(Q*,  x)  >_  and  the  result 

follows. 


As  a  consequence  of  the  above  two  lemmas,  the  first  pair 
of  documents  retrieved  by  Q*-{n}  is  {x^,  ^  •  This 
indicates  that  Q*-{n}  is  better  than  Q  *  —  {  i }  ,  1  £  i  <_  n-1 , 
with  respect  to  document  ranking  when  retrieving  the  first 
pair  of  documents.  The  same  result  is  true  when  comparing 
Q *  — { j }  with  Q *  — { i } ,  for  Iw^l  >  Iw J .  This  is  demonstrated 
by  Lemma  5.3.3,  whose  proof  is  trivial. 


Lemma  5.3.3 

Let  x.  0  and  x.  be  the  associates  of  x  with  respect 
i,x  3  f  ^  I 

to  term  i  and  j  respectively,  then 

g(x.  9)  >  g ( x •  ~)  <=>  I w . I  >  Iw.l. 


The  following  proposition  relates  the  documents 
retrieved  by  Q*-{i}  and  Q*.  *\s  a  consequence  of  the 
proposition,  it  can  be  shown  easily  by  induction  on  k  that 
whenever  Q *  — { i }  retrieves  2k  documents,  the  2k  documents 
must  include  k  ^  * 


. 
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Proposition  5.3.4 

Let  {x.  ,  ,  x.  ....  x.  -.}  be  the  set  of  2k  documents 
—l ,1  — i , 2  —i,2k 

having  the  highest  retrieval  status  value  with  respect  to 
Q  *  —  {  i }  ,  for  some  i,  1  <_  i  _<  n. 

If  {x  ,  x  }  is  a  pair  of  associates  of  term  i,  g(x  )  >  g(x  ) 
— m  — p  —  m  — p 

and  f  ( Q  *  — { i }  ,  x.  .)  >  f(Q*-{i},  x  )  1  <  t  <  2k, 

1  f  u  p 

then 


(i) 

(ii) 


i .  e . 

=  > 


g  (x  )  >  g (x  )  ,  1  <  t  <  2k, 

—  l ,  t  —  — p  —  — 

{-i,l'  — i , 2 '  *  *  *  '  -i,2k} 

{ x, , x_ , . . . , x  ,  (2k-m-l)  documents  between  x  and  x  , 

—1—2  — m  -m  -p 

V 

g(x)  >  g(xj 

£ (0*-{ i } .  x)  >  f  ( Q  *— { i} ,  xm). 


Proof 

(i) 


=  f  (Q*  f  x.  .) 

1  /  U 

■  f(Q*'ti!'  *ift)  ♦  Vi 

d^=l  if  i  6  t,  d1=0  if  i  0  x.  t 
>.  f  (Q*~{  i}  ,  x  )  +  dlw. 

=  £'2*'  ip>  'Vi  +  diwi 

d~  =  l  if  w.<0,  d  =0  if  w.>0 
2  i  2  l 

=  f(Q*,  x  )  +  (d1_d2)wi 
>.  f  (Q* ,  x  )  • 
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(ii) 

g(x)  >  g(x  ) 

<  =  >  f  (Q* , x)  >  f(Q*,  *m) 

<  =  >  f  (Q*-  {  i  }  ,  x)  +  d.w.  =  f  (Q* ,  x)  >  f  (Q* ,  x  ) 

—  ii  —  — m 

<3 1  =  1  if  i  6  x,  =  0  if  i  &  x. 

=  >  f  (Q*~ { i }  r  x)  +d0w.  >  f  (Q * ,  x)  >  f  (Q*  r  x  ) 

d0  =  0  if  w.  <  0,  d  =  1  if  w.  >0 
2  l  2  i 

<  =  >  f  (Q  *-  {  i }  /  x)  +  d0w  >  f  (Q  * ,  x  )  =  f  (Q  *-  {  i }  ,  x  )  +  d  w. 
<=>  f(Q*-{i},  x)  >  f (Q*- { i } ,  xm)  . 

The  following  lemma  can  be  proved  easily  by  induction 
on  n.  It  is  used  to  prove  Proposition  5.3.6. 


Lemma  5.3.5 

Let  S,  =  la. f  an,  ...,  a  }  and  30  =  { bn  ,  b_ 
112  n  2  12 

two  sets  of  real  numbers  of  arbitrary  size  n. 


If  ak  >  bk  then  a(k)  >  b(k) 


for  1  <.  k  <_  n 
th 


where  a,,  >  and  b..  .  are  the  kw“  largest  numbers 
( k )  (k)  ^ 

respectively. 


. .  .  ,  b  }  be 

n 


in  3 ^  and  S2 


The  next  proposition  compares  the  k  "best"  associate 

t  h 

pair  of  term  i  with  the  k  "best"  associate  pair  of  term  j. 
There  are  2n  ^  associate  pairs  of  term  i  and  2n  ^  associate 
pairs  of  term  j.  We  can  compare  them  pairwisely  by  first 
rearrange  the  associate  pairs  in  their  order  of  retrieval 
status  value  w.r.t.  Q *— { i }  and  Q *  — { j }  respectively. 


Proposition  5.3.6 
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Let 

{  {-i,2k-l'  -i,2k}  : 

1  <  k  <_ 

2n  ^  }  and 

{  {-j,2k-l'  -j,2k}  : 

1  £  k  £ 

2n_1  } 

be 

the  2n  ^  associate 

pairs 

of  term  i  and 

term 

j 

respectively. 

If  the  associate  pairs  are  rearranged  such  that 

g(ii,2k-l>  >  g(£i,2k>' 

9<-j,2k-l’  >  g(-j,2k)'  1  -  k  -  2 

>  f(Q*-lU,  xif2k  +  1)- 

f (Q*- { j } /  xj>2k_1)  >  xj>2k+1).  1  <  k  <  2n_l-2 

and  I w .  I  >  I w  .  I 
i  3 

then 

9 (— i , 2k-l >  ^  g(^j,2k-l>' 
and  g(Xj  2k>  1  g(x_j  2|<)/  1  £  k  £  2n  1. 

The  propostition  states  that  the  better  associate, 

x.  ,  ,  with  respect  to  term  i  is  better  than  the  better 
— 1 , 2k-l 

associate,  x .  , ,  with  respect  to  term  j  but  the  worse 

— 3  , ZK-I 

associate,  x^  with  respect  to  term  i  is  worse  than  the 

worse  associate,  x-  ov  respect  to  term  j.  Thus,  it  is 

still  not  clear  whether  Q*-{j)  is  better  than  Q*-{i}. 

Sketch  of  proof 

(a)  construct  a  simple  one-to-one  mapping  M  from  D  to  D  such 

that  if  {x  ,  x  }  are  associates  with  respect  to  term  j  and 
— m  —p 

9  <-m>  >  g<V 


' 
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then  ( i)  {M (xm) ,  M (x  ) }  are  associates  w.r.t. 

r 


(  ii)  g  (M  (x)  )  >  g  (M  (x  )  ) 

r 

(  iii)  g (xm)  £  g (M (x^) ) 

(  iv)  g  (x  )  >.  g  (M  (x)  ) 

F  P 


term  i 


(b)  set 


at  =  g(V 
bt  =  g(M(V 

ct  =  g(v 

d.  =  g  (M  (x  )  ) 

U  p 


Then  afc  £ 

L. 

ct 

>  dp 

at  >  ct' 

bt  >  dt 

by  definition  of 

mapping  M 

• 

(c)  By  Lemma  5 

.3. 

5, 

a(k) 

-  b(k) 

c  (k) 

-  d(k) 

1  <  k  < 

2n-l 

where 

a(k) 

is 

the 

ktb  large 

st  number 

among  the  a^'s. 

Since  afc- 

ct  = 

1  w  . 
D 

!  and 

Vdt  - 

1 w . 1  for 

l 

all  t,  we  have 

a(k) 

=  at 

for 

some 

=>  C(k)  =  Ct: 

b(k) 

“  bt 

for 

some 

b  d(k)  =  df 

Hence  a^ 

>  ck 

<  =  > 

3(k) 

>  °(k) 

bk 

>  dk 

<  =  > 

b<k, 

>  d (k)  ' 

1  £  k  £ 

2n~1 . 

(d)  show 

that 

the 

sorting  of  a' 

s ,  b '  s ,  c 

'  s  and  d ' s  result 

an  arrangement 

of 

the 

associate 

pairs,  more  precisely, 

g  (x. 

2k-l ) 

= 

b(k) 

>  d(k) 

g(ii,2k> 

g  (x  . 
y  -D  r 

2k-l) 

= 

a(k) 

>  C(k)  = 

g(ij,2k> 
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b(k)  >  b(k+l)  <  >  f(Q*  {l}'  >  f(Q*  {i}/  — i , 2k  +  l } 

a(k)  >  a(k  +  l)  <  =  >  —  j  r  2k-l )  >  f(Q*"{j}'  —  j  ,  2k  +  l } 


(e)  finally  show  that  the  results  are  true, 
(k)  ^  b(k)  <=>  g(ij,2k-l>  1  g(ii,2k-l> 


a 

c 


(k)  >  d(k)  <=>  g (— j , 2k>  i  g(ii,2k> 


Proof  : 


There  are  4  cases 


(i)  w  ^  >  Wj  >  0 

(  i  i )  w  .  <  w  .  <  0 

i  1 

(iii)  w.  >  0,  w.  <  0,  w.+w.  >  0 

l  3  13 

(iv)  w-  <  0,  w.  >  0,  w.+w.  <  0 

1  j  13 


\  mapping  M  for  cases  (i)  and  (ii)  is 


M  :  D - >  D 


where  M  (x)  is  the  vector  formed  by  interchanging  the  i'  and 


j tb  component  of  x. 


1 .  e 


X  ( • • • ,  X  ^  ,  • • • t  X  j  ,  •••) 

M  ( _x )  —  (...,  x  j  ,  *  •  •  '  •••) 

This  mapping  M  can  be  shown  to  satisfy  condition  (a). 


th 


A  mapping  M  for  case  (iii)  and  (iv)  is 
M  :  D - >  D 

where  M ( x )  is  the  vector  formed  by  interchanging  the  i^“  and 
jtb  component  of  x  and  then  complementing  them. 

i .  e .  21  =  (•••/  x  ^  f  •  •  •  ,  x  ^  ,  •••) 

M  (  X. )  =  (...,  1“X  j  r  •  •  •  r  1  “  X  ^  ,  ...) 


' 
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This  mapping  M  can  also  be  shown  to  satisfy  condition  (a). 


As  an  illustration,  the  proof  of  case  (iv)  is  as  follows  : 


(iv) 

<  o , 

w  . 

1 

> 

0 ,  w . +w  .  <  0 
i  1 

x  = 

• 

• 

• 

ro 

H* 

r  •  • 

•  r 

e  ^  ,  .  .  .  ) 

M(x) 

( .  •  •  , 

(1- 

e  . 

),  ...,  ( 1  —  o  ^ ) ,  ...) 

Let 

{ x  ,  x  } 
-m  — p 

be 

a 

pair  of  associates  of  term 

such 

that  g ( 

— m 

> 

g(xp)  (->  j  e  xm,  j  e  Xp) . 

Clearly,  {M(x  ),  M(x  ) }  is  a  pair  of  associates  of  term  i. 

— m  — p 

g(M(xm))  -  g(M(x  ))  =  w.(l-l)  -  w.(l-O) 

=  -w .  >  0 . 

1 

g(x)  -  g(M(x))  =  w.e.  +  w.e.  -  w.(l-e.)  -  w.(l-e.) 

^  —  —  ii  33  i  3  3  l 

g(x  )  -  g ( M ( x  ))  =  w.e.  +  w . * 1  -  w.(l-l)  -  w  .  ( 1 - e  .  ) 

^  —  m  —  m  ii  3  i  3  i 

=  (w . +w . ) *e  .  <  0 
l  3  l  - 

g  ( x  )  -  g(M(x  ))  =  w.e.  +  w.*0  -  w  (1-0)  -  w .  ( 1  -  e .  ) 

^  — p  ^  —  p  ii  3  i  3  i 

=  ( w  .  +w  .  )  *  ( e  .  -1 )  >  0 
j  i  i  — 

Conditions  of  (a)  are  satisfied. 

Step  (b)  to  step  (e)  are  straight  forward  and  the  results 
immediately  follow. 


The  last  proposition  shows  that  Q *  — { j  }  >  Q *  — {  i  }  with 
respect  to  document  ranking  if  I w ^ I  >  Iw.l. 


Pr oops i t ion  5.3.7 

If  the  set  of  2k  documents  retrieved  by  Q *  — { j }  and  Q *— { i } 
are 

1  —  j  ,  1 '  -j,2'  •••  '-j,2k  1  and 


- 
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!  ii,l' 

x  . 

-i , 

2' 

•  •  • 

i , 2k 

}  respectively, 

and  g ( x . 
-3 

,h» 

> 

g  (*j 

, h+1 } r 

g  (*i 

,h' 

> 

g  (Hi 

,h  +  l}  ' 

1  < 

h 

<  2k 

then 

g  (Xj 

>_ 

g  (*i 

,t>'  1 

£  t 

<_ 

2k. 

Proof 

(By  induction  on  k) 

At  k=l . 

By  Lemma  5.3.1  and  Lemma  5.3.3,  the  proposition  is  true  at 
k=l . 


Suppose  the  proposition  is  true  for  1  to  k. 

Let  the  set  of  2k  documents  retrieved  by  Q *  — { j } 


be  { x  .  ,  x.  9,  ... 

J  t  1  ] 


'^j,2k}  and  — i ,  2 ' 


and  Q*- {  i  } 


'^i,2k( 


respectively . 

Consider  the  case  for  k+1. 

Let  the  next  pair  of  documents  retrieved  by  and 

Q*-{i}  be  Up}  and  {>£,_»  — respectively,  such  that 

g(xm)  >  g(Xp)  and  g(xt)  >  g(*u)» 

By  Proposition  5.3.4,  g  ( x  .  _  )  >  g  (x  )  and  g(x.  01)  >  g(x  ). 

By  Proposition  5.3.6,  g  (x  )  >.  g(xu)  but  g(x^)  <_  g(xt). 

By  Proposition  5.3.4,  the  set  of  2k  documents  retrieved  by 


Q *  — {  i  }  must  be 

{ •  ^2  '  •  •  • 

-i,2k} ' 


,xL  ..  ,  2k-t  documents  between  xL  and  x.  , 

— 1-1  — t  —i,2k 


46 


and  by  induction  assumption  and  Proposition  5.3.4,  the  set 
of  2k  documents  retrieved  by  Q*-{j}  must  be 


{ 2ij  f  '  •  •  • 

and  xjf2k,  xj>2k 


,x.  ...,  x  .,  2k-m  documents  between  x 


-t-1 

}. 


— m-1 


— m 


After  the  inclusion  of  {x  ,  x  }  and  {x.  ,  x  }  respectively, 

— m '  —  p  — t  — u  v  1 

the  ordering  is  clearly  preserved. 


Corollory  5.3.7 

Q*-{n}  ranks  documents  closer  to  optimal  than  Q *  — { i } , 

1  £  i  <  n. 

Hence  we  can  rank  the  usefulness  of  terms  in  their 
order  of  I w . I . 

l 
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Chapter  6 


Conclusion 


The  thesis  identifies  the  objective  of  retrieving  more 
documents  that  are  of  interest  to  the  user  to  be  a  two-way 
classification  or  discriminant  problem.  Thus  an  optimal 
retrieval  rule  is  one  that  best  discriminates  the  set  of 
relevant  documents  from  the  set  of  irrelevant  documents. 
Two  of  the  most  common  components  in  retrieval  performances 
are  precision  and  recall.  They  are  used  to  evaluate  the 
performance  of  retrieval  rules  and  queries  in  the  thesis. 
An  optimal  retrieval  rule  (2.3.1)  has  been  derived  that 
maximizes  precision  at  any  recall  and  has  been  shown  that  it 
also  ranks  documents  in  descending  order  of  relevance. 

An  information  retrieval  system  can  be  implemented  more 
easily  by  queries  and  a  simple  matching  function  (2.1.1) 
rather  than  the  optimal  retrieval  rule.  One  may  consider  an 
optimal  query  to  be  a  linear  reduction  of  the  optimal 
retrieval  rule.  Unfortunately,  such  a  reduction  is  not 
possible  in  general  but  under  assumptions  on  the 
distribution  of  index  terms.  Thus  the  retrieval  rule 
reduces  to  different  query  forms  under  different 
assumptions.  Three  common  statistical  models  of  information 
retrieval  systems  have  been  studied  in  the  thesis.  In 
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chapter  3,  the  optimal  queries  of  these  three  models  are 
derived.  Model  1,  the  binary  independent  model,  has  been 
shown  to  be  a  linear  approximation  of  the  binary  model  that 
incorporates  the  dependence  of  terms  (Appendix  1). 

Constructions  of  optimal  queries  are  not  possible 
unless  there  are  statistical  information  of  the  set  of 
relevant  documents  and  the  set  of  irrelevant  documents.  We 
have  pointed  out  that  users'  initial  queries  are  usually  not 
optimal  and  the  system  may  be  requested  to  modify  the 
initial  queries  to  achieve  better  performance.  In  chapter 
4,  the  processes  to  estimate  the  parameters  needed  to 
construct  the  optimal  queries  of  the  three  models  are 
presented,  by  making  use  of  relevance  information  of  the 
documents  retrieved  by  the  initial  queries.  The  estimation 
processes  presented  here  allow  limited  dependencies  of  terms 
but  they  demonstrate  a  systematic  statistical  use  of 
relevance  information.  It  is  hoped  that  the  estimation  can 
be  generalized  to  the  term  dependence  case.  The  optimal 
query  of  model  3,  which  is  a  special  case  of  the  Fisher's 
linear  discriminant,  we  think,  can  be  applied  to  non-binary 
models  and  more  general  estimation  procedures  can  be 
in  vesti  gated. 

Last  but  not  the  least,  the  relative  importance  of 
index  terms  in  a  query  of  the  first  model  with  respect  to 
retrieval  performance  has  been  studied.  An  example  (5. 2) 
has  been  given  to  illustrate  that  deleting  the  term  of 


' 
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smallest  absolute  weight  from  the  optimal  query  is  not  the 
best  choice  in  terms  of  precision  and  recall  when  a  term  has 
to  be  deleted.  Therefore,  a  new  method  -  document  ranking 
(5.1.3)  is  proposed  tc  compare  performance  of  queries. 
Section  5.3  shows  that  the  query  which  results  from  deleting 
the  term  of  smallest  absolute  weight  from  the  optimal  query, 
is  best  according  to  this  new  performance  measure 
(definition  5.1.3),  results  in  least  disturbance  in  document 
ranking,  and  ranks  the  documents  closer  to  the  optimal  than 
deleting  any  other  term.  Hence  the  larger  the  absolute 
weight  of  a  term  the  more  importance  it  is  in  retrieval. 
Terms  can  then  be  ranked  in  decreasing  order  of  usefulness, 
allowing  the  less  useful  ones  to  be  deleted  without 
seriously  affecting  retrieval  performance.  The  analysis  of 
ranking  terms  turns  out  to  be  rather  involved.  It  is  hoped 
that  the  same  approach  can  be  generalized  to  the  case  of 
deleting  more  index  terms  and  to  other  models.  The  deletion 
of  terms  from  feedback  queries  is  necessary  in  order  to 
speed  up  the  process  of  retrieval. 

There  are  indeed  many  questions  and  problems  in  the 
area  of  feedback  query  construction  but  we  believe  that  the 
theory  presented  in  this  thesis  demonstrates  a  systematic 
statistical  study  of  information  retrieval  systems  and  a 
statistical  use  of  relevance  information.  Moreover,  the 
study  of  term  deletion  is  also  very  important.  Though  we 
have  made  a  good  start  in  analysing  the  usefulness  of  index 
terms,  further  investigation  is  required. 


3?a 
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Appendix  1 

The  optimal  query  of  Model  1  is  a  "first  order"  or 
linear  approximation  of  optimal  queries  obtainable  in  a 
binary  model  which  incorporates  the  dependence  of  terms. 

The  operative  part  of  the  optimal  retrieval  rule, 
P(T=xlC^),  can  be  written  in  the  form  of  a  series  expansion 
[5]  , 


P  (T=x 1C, )  =  P  (T=x 1C  )  *  [  1  +  > _  g .  .z . z . 

L  i  i  i<j  1  1 


> 


i  <  j  <  k 


g  .  . .  z  .  z  .  z,  +  ... 
h]k  i  ]  k 


where 


g  .  .  =  E  ( z  .  z  . )  , 
i!  i  j 


g...  =  E  ( z  .  z  •  z.  ) 

^  i  j  k  l  ]  k 


z  . 
i 


x  .  -p . 

l 


[p .  <l-pi) ] 


n 


1/2  ' 


and  Pn  (T=xlC.)  =  I  I  p.Xi  *  ( 1 - p . ) 1  Xi 

1  1  i  =  l  1 

i.e.  the  expansion  of  PfT^IC^)  when  T^'s  are  independent 

When  y  is  small,  log(l+y)  can  be  approximated  by  y, 
therefore , 


log  P  (T  =  x 1C. )  =  log  P.  (T=x 1C. )  +  > _  g .  .z  .  z  .  + 

1  11  i<j  1  J 


and  log  P^(T=>clC^)  is  a  "first  order"  or  linear 
approximation  of  log  P(T=x|C1). 

Thus  the  optimal  query  of  Model  1  (derived  from 
log  (P-^  (T=x  IC^)  /P^  (T=x  |C2)  )  )  is  a  "first  order"  or  linear 
approximation  of  the  optimal  queries  obtainable  in  a  binary 
model  which  incorporates  the  dependence  of  terms. 


